Selection of Lexical Units for Continuous Speech Recognition of Basque

نویسندگان

  • Karmele López de Ipiña
  • Manuel Graña
  • Nerea Ezeiza
  • M. Hernández
  • Ekaitz Zulueta
  • Aitzol Ezeiza
  • C. Tovar
چکیده

The selection of appropriate Lexical Units (LUs) is an important issue in the development of Continuous Speech Recognition (CSR) systems. Words have been used classically as the recognition unit in most of them. However, proposals of nonword units are beginning to arise. Basque is an agglutinative language with some structure inside words, for which non-word morpheme like units could be an appropriate choice. In this work a statistical analysis of units obtained after morphological segmentation has been carried out. This analysis shows a potential gain of confusion rates in CSR systems, due to the growth of the set of acoustically similar and short morphemes. Thus, several proposals of Lexical Units are analysed to deal with the problem. Measures of Phonetic Perplexity and Speech Recognition rates have been computed using different sets of units and, based on these measures, a set of alternative non-word units have been selected.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

First approach to the selection of lexical units for continuous speech recognition of Basque

The selection of appropriated Lexical Units is an important issue in the Language Model (LM) generation. Word has been used classically as unit in most of the Continuous Speech Recognition systems. However, during the last years proposals of non-word units have begun to appear. Since Basque is an agglutinative language with a certain structure inside the word, the nonword units could be an adeq...

متن کامل

Automatic Morphological Segmentation for Continuous Speech Recognition of Basque

The selection of appropriate Lexical Units (LUs) is an important issue in the development of Continuous Speech Recognition (CSR) systems. Word has been used classically as unit in most of them. However, proposals of non-word units have begun to arise. Since the subject of this study is the Basque language, which is an agglutinative language with a complex structure inside words, non-word units ...

متن کامل

Selection of sublexical units for continuous speech recognition of basque

This paper describes the work carried out to select the most suitable set of Sublexical Units for Continuous Speech Recognition of Basque. Even if there are several dialects in Basque, only one of them has been used to choose the preliminary set of sounds. Bearing in mind this aim, a wide experimentation has been carried out to select Context Independent Phone-Like Units. Then, in order to obta...

متن کامل

Decision Tree-Based Context Dependent Sublexical Units for Continuous Speech Recognition of Basque

This paper presents a new methodology, based on the classical decision trees, to get a suitable set of context dependent sublexical units for Basque Continuous Speech Recognition (CSR). The original method proposed by Bahl [1] was applied as the benchmark. Then two new features were added: a data massaging to emphasise the data and a fast and efficient Growing and Pruning algorithm for DT const...

متن کامل

Evaluation of a Spoken Phonetic Database in Basque Language

In this paper we present the evaluation of a spoken phonetic corpus designed to train acoustic models for Speech Recognition applications in Basque Language. A complete set of acoustic-phonetic decoding experiments was carried out over the proposed database. Context dependent and independent phoneme units were used in these experiments with two different approaches to acoustic modeling, namely ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003